                       "How does it work" file for U7WIN9X
                                 August 2000
                               Gilbert Rouqui
                          <e-mail address to be filled> 
                            <dragon name to be filled>
  
|                                Version 1.20
  
  
  For the curious and technically minded : How U7WIN9X Works
  ----------------------------------------------------------
  
  -- What Flat Real Mode is --
  ----------------------------
  In this discussion 386 stands for the Intel 386 Architecture, whose CPU
  representants are 80386s, 80486s, Pentium, including MMX, Pro, II, Xeon and
  III and compatible CPU such as AMD K5, K6, Athlon ...
  
  VOODOO, as named by Origin Inc, and as present in Ultima 7 Part I and Part II
  is the use of 32 bit wide memory addresses to read and write from and to any 
  memory location of the 386 Addressing Space (4 GB). This, from a DOS program
  which uses inherently 1  6 bit wide addresses.
  
  32 bit wide addresses can always be used from a 16 bit native program, a DOS
  program such as Ultima 7, because the 386 allows an instruction to contain
  the override byte 0x67 (ADR32) whose aim is to notify the processor that the 
  current instruction is to be executed as a 32 bit mode instruction. (Strictly
  speaking, this is a toggle. When used while the machine is in 32 bit mode,
  it then instructs the processor to handle the instruction as 16 bit mode.)
  
  BUT even trying to access 32 bit address within the current DS segment must
  pass the DS segment Limit check, which in DOS real mode and Virtual 86 mode
  is 64k. So simply said, you may be free to use 32 bit addresses but this will
  not help you in any way because any address above 64 k = 16 bits will trap
  in General Protection Fault. 
  
  This is where Flat Real Mode, also named Unreal Mode, comes in. It looks like
  a bug in the 386, but so many programs know about it and use (I would say
  abuse) it, that Intel cannot fix it. Here it is. 
  
  The DS or ES segment limit of 64k is stored into the segment descriptor cache
  at power-on time, when the CPU starts in Real Mode. But if a program switches
  from Real Mode to Protected Mode, creates a segment descriptor with a limit
  larger than 64k, for example 4G, sets that descriptor into, say, DS and ES,
  and quickly reverts to Real Mode, then the segment limit in the descriptor
  cache, set by the setting of DS and ES in protected mode, is NOT REVERTED
  back to 64k when the processor returns to Real Mode. Since then the segment
  limit in the descriptor cache is never to be modified again by any instruction
  in Real Mode, it remains set to the value it received from the descriptor in
  Protected Mode. Thus, with the example on ES and DS to 4G limit, the CPU allows
  free access with DS and ES to any location in the range of 4G.
  
  Intel could not plug this hole in Real Mode. But Virtual 86 mode has no such
  hole. The 64k limit of Virtual 86 addresses in strictly enforced. Since 
  Virtual 86 mode is what your DOS programs are using whenever they run under 
  Windows, Flat Real Mode is out, and so is VOODOO and so is Ultima 7.
  
  -- How is U7WIN9X overcoming this --
  ------------------------------------
  So how does this work in U7WIN9X ? Basically U7VXD.VXD, a protected mode 32
  bit mode piece of code, sits normally waiting. When Ultima 7 attempts a Flat
  Real instruction, the game is interrupted, and the VXD gains control. It
  obtains all the registers from the game at the point of interrupt, builds
  a conformant image of the forbidden Flat Real instruction and executes
  it on behalf of the game.
  
  Since the VXD is 32 bit protected mode, it has proper access to the whole
  memory range of the game. So the memory access attempted and failed as Flat 
  Real Mode will succeed when rerun by the VXD. When done, the VXD stores 
  the registers and the condition flags back and resumes the game. 
  
  VOODOO in Ultima 7 is simple to handle because when used, all Flat Real Mode 
  instructions have DS and ES set to 0000. This fits nicely with the DS and
  ES within a VXD, because they are Base=0, Limit=4G.
  On the other hand, performance is crucial because VOODOO is used not only
  to access XMS memory allocated above 1 M - it is called VOODOO memory by 
  Ultima 7, but also far memory between the end of the EXE and 1M, and this
  includes the 64 k of the Video VGA buffer at 0x000A0000. 
  To give a feeling, while the game starts, it displays a shifting mosaic
  pattern, red in Part I, blue in Part II, for a few seconds. During
  this time, VOODOO executes about 5 million Flat Real Mode instructions !!!
  
  This is however a bit too simplified. This is how I hoped initially to make it
  work. In reality, the VXD does not catch the General Protection Fault from 
  the game. Instead it catches BreakPoint Faults from BreakPoint Interrupt
  instructions set to replace all Flat Real Mode instructions in the game.
  
  So the VXD has the additional task, when an EXE of the game has been loaded
  by U7RUN, to patch it in memory at all locations where Flat Real Mode 
  instructions are. It stores in their place a byte 0xCC (Breakpoint interrupt)
  followed by a coding byte which describes the kind of Real Mode instruction
  at that location. 
  
  Then the VXD hooks on the Breakpoint Fault on behalf of the game. When the
  game runs over the byte 0xCC, it is interrupted and the VXD takes contro.
  It then uses the coding byte and if needed the remainder of the instruction
  bytes to regenerate the Flat Real Mode instruction and execute it. 
  
  The patching strategy works because every Flat Real mode instruction is
  2 bytes or more, so that the VXD has the space to store the 0xCC and
  the coding byte. Every Flat Real Mode instruction is 2 bytes or more
  because it needs at least one byte at least for the codop and the arguments
  AND one byte for the ADR32=0x67 override.
  
  The reason I did not hook directly the General Protection Fault that would 
  occur if the game was left running the original Flat Real instructions was
  that then the VXD would have a harder time to determine whether the GPF
  was a genuine VOODOO event from the game or not, and to decode and regenerate
  the Flat Real Mode instruction.
  
  -- In Action --
  ---------------
  Well this is the overall principle of the heart of U7VXD.VXD. Now for
  the specifics :
  1   U7RUN.COM determines that this is Ultima 7 Part II when it finds SI.EXE
                in its own directory. It assumes Ultima 7 Part I otherwise.
  2   U7RUN.COM loads U7VXD.VXD : U7RUN uses VXDLDR to dynamically load U7VXD.
                Just before returning to DOS, U7RUN similarly uses VXDLDR to 
                unload U7VXD. VXDLDR.VXD is a staticly loaded thus always
                available, Windows provided VXD with DeviceID=0027 whose aim is
                to service requests to dynamically load or unload VXDs.
  3   U7RUN.COM schedules MAINMENU.EXE, INTRO.EXE, SI.EXE/U7.EXE and ENDGAME.EXE
                according to the return code each one supplies to detemine which 
                one should follow. This is how ULTIMA7.COM/SERPENT.COM work.
  4   U7RUN.COM hooks software interrupt 0x21 (DOS calls). When a DOS call comes
                in and this is an Open (0x3D) and the file to be opened is
                EMMXXXX0, then it rejects it with code 2 and carry set. This
                is Ultima 7 inquiring about EMS. Should Ultima 7 find EMS
                active, it would then stop with a message complaining about
                protected mode, please remove the offending program.
                All other DOS commands are passed through.
  5   U7RUN.COM hooks software interrupt 0x2f (Win calls). This time this is to
                catch XMS calls. When an XMS Allocate command (0x09) comes in,
                U7RUN reflects it to U7VXD which uses _PageAllocate to acquire
                memory within the 32 bit wide address space of the DOS Virtual
                Machine, on behalf of the game. It _PageFree(s) it when the
                corresponding XMS Deallocate command (0x0A) comes in. 
                Ultima 7 also uses a XMS Lock request (0x0C) to fix the XMS
                memory block in memory and obtain its 32 bit address.
                When it receives it, U7RUN issues to U7VXD the request to patch
                the current program along with the identity of the EXE to be
                patched. When completed, U7RUN returns to the game the location
                of the _PageAllocate(d) memory.
                U7RUN handles XMS Unlock (0x0D) as a no-op. 
                U7RUN always returns 1 MByte XMS memory available to XMS Query
                Available Memory (0x08). This value seems to satisfy Ultima 7.
                Less memory would make it slower, more memory would induce it 
                to create and manage privately a Virtual Disk. This is useless
                since Windows handles its own Disk Cache, better than Ultima 7
                can do. 
                All other Win calls are passed through.
  
  6   U7VXD.VXD when loaded, registers the identity of the Virtual Machine that
                started it. It will then react only to requests coming from the
                same virtual machine. It also hooks the Breakpoint interrupt
                from the Virtual 86 mode. When unloaded, it unhooks the Virtual
                86 Breakpoint interrupt.
  7   U7VXD.VXD handles Memory allocate and free requests as the first two cases
                of Breakpoint interrupt 0xCC, followed by coding byte 0x00.
  8   U7VXD.VXD handles EXE patch requests as the last case of Breakpoint 
                interrupt 0xCC, followed by code 0x00. U7VXD has a table with
                the location of all Flat Real instructions for the seven EXEs
                in Ultima 7 that use VOODOO : 
                      MAINMENU.EXE    (Part I and Part II)
                      ENDGAME.EXE     (Part I and Part II)
                      INTRO.EXE       (Part II only, Part I does not use VOODOO)
                      U7.EXE (Part I) and SI.EXE (Part II)
                It patches the EXE in memory and returns. Each patched 
                instruction is 0xCC, followed by a code byte that cannot be 
                0x00, followed by the original remainder of the Flat Real Mode
                instruction.
  
  9   U7VXD.VXD receives control on a BreakPoint interrupt. This is the true
                heart of U7WIN9X. 
                Based on the coding byte and the remainder of the instruction
                it rebuilds the original instruction locally. Well, not exactly.
                Since the VXD is 32 bit mode whereas the game is 16 bit mode,
                the 0x67 ADR32 byte is useless so it is not copied or generated
                and similarly the 0x66 USE32 byte is toggled, that is, it is
                removed if it was present, added if it was not present. 
                The Virtual Machine data registers EAX, EBX, ECX, EDX, ESI, EDI
                and the user part of the condition flags are loaded. Fortunately
                VOODOO does not use either (E)SP or (E)BP in any of its Flat 
                Real instructions. Since also VOODOO sets the ES and DS segment
                registers to 0000 on all Flat Real instructions, U7VXD leaves
                ES and DS to their VXD default.
                The rebuilt instruction is executed, then the registers and
                the flags (for the comparison instructions) are stored back. 
                U7VXD has a fast path for the family of string instructions, 
                because VOODOO uses them very frequently (LODS, MOVS, STOS 
                and alike).

| -- U7DPMI In Action --
| ----------------------
|
|     U7DPMI    plays a very similar path to the U7VXD/U7RUN pair above. To
|               avoid repeating every item, I shall summarize the similarities
|               and highlight the differences.
|     U7DPMI    does 1 above in full.
|     U7DPMI    detects DPMI - software interrupt 0x2f code 0x1687, switches
|               itself to Protected 16 Bit Mode and remains so until the game
|               completes.
|     U7DPMI    does 3 and 4 above in full.
|     U7DPMI    does 5 above with some actions being different from U7RUN :
|               On XMS (De)Allocate, it performs a DPMI Memory (De)Allocate
|               software interrupt 0x31 code 0x0501/0x0502. The rest is
|               similar to U7RUN except that the code patch - 8 above - is done
|               from U7DPMI instead of from the VXD. Indeed this is because of
|               the patch tables that U7DPMI is considerably bigger than U7RUN.
|     U7DPMI    creates a DPMI real to protected Callback. Tis piece of magic
|               is equivalent to a far call from real/virtual mode that runs
|               a routine in protected mode. Here the routine is the protected
|               breakpoint interrupt handler, part of U7DPMI. Then the callback
|               is set as real/virtual Interrupt 3 = Breakpoint by another DPMI
|               request.
|     U7DPMI    handles the breakpoint interrupts similarly as 9 above except
|               that the handler is 16 bit mode instead of 32 bit. This
|               simplifies somewhat the instruction rebuild.
|
| The crux of the performance in U7DPMI is the way Windows implements a 
| DPMI callback from real/virtual to protected mode. The far call cannot switch
| to protected mode by itself. So its implementation goes into the Virtual 
| Machine Manager (VMM.VXD in Win9x, NTVDM in NT/Win2k) and involves a fair
| amount of processing.